Chinese-Japanese Clause Alignment

نویسندگان

  • Xiaojie Wang
  • Fuji Ren
چکیده

Bi-text alignment is useful to many Natural Language Processing tasks such as machine translation, bilingual lexicography and word sense disambiguation. This paper presents a Chinese-Japanese alignment at the level of clause. After describing some characteristics in Chinese-Japanese bilingual texts, we first investigate some statistical properties of Chinese-Japanese bilingual corpus, including the correlation test of text lengths between two languages and the distribution test of length ratio data. We then pay more attention to n-m(n>1 or m>1) alignment modes which are prone to mismatch. We propose a similarity measure based on Hanzi characters information for these kinds of alignment modes. By using dynamic programming, we combine statistical information and Hanzi character information to find the overall least cost in aligning. Experiments show our algorithm can achieve good alignment accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Japanese-Chinese Phrase Alignment Exploiting Shared Chinese Characters

Common Chinese characters between Japanese and Chinese have been proved to be effective in Japanese-Chinese phrase alignment. Besides common Chinese characters, Japanese and Chinese also share many other semantically equivalent Chinese characters. However, there are no available resources for this kind of Chinese characters. In this paper, we propose a statistical method aiming to detect these ...

متن کامل

Japanese-Chinese Phrase Alignment Using Common Chinese Characters Information

We describe a method to detect common Chinese characters between Japanese and Chinese automatically by means of freely available resources and verify the effectiveness of the detecting method. We use a joint phrase alignment model on dependency trees and report results of experiments aimed at improving the alignment quality between Japanese and Chinese by incorporating the common Chinese charac...

متن کامل

Bursty Topics in Time Series Japanese / Chinese News Streams and their Cross-Lingual Alignment

This paper studies issues regarding topic modeling of information flow in multilingual news streams. If someone wants to find differences in the topics of Japanese news and Chinese news, it is usually necessary for him/her to carefully watch every article in Japanese and Chinese news streams at every moment. In such a situation, topic models such as LDA (Latent Dirichlet Allocation) and DTM (dy...

متن کامل

Alignment and Word Order in Old Japanese Alignment and Word Order in Old Japanese Keywords Active Alignment @bullet Ergative Alignment @bullet Split Intransitivity @bullet Case @bullet Nominalization @bullet Verbal Prefixes @bullet Clitic Pronouns @bullet Nominal Hierarchy

This paper argues that Old Japanese (eighth century) had split alignment, with nominative-accusative alignment in main clauses and active alignment in nominalized clauses. The main arguments for active alignment in nominalized clause come from ga-marking of active subjects and the distribution oftwo verbal prefixes: /-for active predicates and safor inactive predicates (cf. Yanagida, In: Hasega...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005